442 research outputs found

    clValid: An R Package for Cluster Validation

    Get PDF
    The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.

    An Analysis of Using Time-Series Current and Deferred Income Tax Expense to Forecast Income Taxes Paid

    Get PDF
    Prior research, using cross-sectional data, concluded that interperiod income tax allocation is useful in forecasting income tax payments (Murdoch, Costa, & Krause, 1994 and Cheung, Krishnan, & Min, 1997). Both these articles suggested that future research should focus on investigating whether time-series data are also useful in forecasting income tax payments. This paper uses time-series data from 235 Compustat firms over a 20-year period to evaluate whether income tax expense is useful in forecasting one-, two-, and three-year ahead income tax payments. We conclude that firms’ predictions are more accurate for shorter forecast horizons. Additionally, we determine that deferred income tax expense enhances the ability of current income tax expense to predict future tax payments for approximately 40% of firms across all three forecast horizons. Furthermore, we find that the prediction accuracy of a firm’s one-year ahead forecasts is significantly related to the prediction accuracy of its two- and three-year ahead forecasts

    BayesMetab: treatment of missing values in metabolomic studies using a Bayesian modeling approach

    Get PDF
    Background: With the rise of metabolomics, the development of methods to address analytical challenges in the analysis of metabolomics data is of great importance. Missing values (MVs) are pervasive, yet the treatment of MVs can have a substantial impact on downstream statistical analyses. The MVs problem in metabolomics is quite challenging and can arise because the metabolite is not biologically present in the sample, or is present in the sample but at a concentration below the lower limit of detection (LOD), or is present in the sample but undetected due to technical issues related to sample pre-processing steps. The former is considered missing not at random (MNAR) while the latter is an example of missing at random (MAR). Typically, such MVs are substituted by a minimum value, which may lead to severely biased results in downstream analyses. Results: We develop a Bayesian model, called BayesMetab, that systematically accounts for missing values based on a Markov chain Monte Carlo (MCMC) algorithm that incorporates data augmentation by allowing MVs to be due to either truncation below the LOD or other technical reasons unrelated to its abundance. Based on a variety of performance metrics (power for detecting differential abundance, area under the curve, bias and MSE for parameter estimates), our simulation results indicate that BayesMetab outperformed other imputation algorithms when there is a mixture of missingness due to MAR and MNAR. Further, our approach was competitive with other methods tailored specifically to MNAR in situations where missing data were completely MNAR. Applying our approach to an analysis of metabolomics data from a mouse myocardial infarction revealed several statistically significant metabolites not previously identified that were of direct biological relevance to the study. Conclusions: Our findings demonstrate that BayesMetab has improved performance in imputing the missing values and performing statistical inference compared to other current methods when missing values are due to a mixture of MNAR and MAR. Analysis of real metabolomics data strongly suggests this mixture is likely to occur in practice, and thus, it is important to consider an imputation model that accounts for a mixture of missing data types

    clValid: An R Package for Cluster Validation

    Get PDF
    The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results

    Surgical resection for hilar cholangiocarcinoma: experience improves resectability

    Get PDF
    AbstractObjectivesIn hilar cholangiocarcinoma, resection provides the only opportunity for longterm survival. A US experience of hilar cholangiocarcinoma was examined to determine the effect of clinical experience on negative margin (R0) resection rates.MethodsWe conducted a retrospective analysis of 110 consecutive hilar cholangiocarcinoma patients presenting over an 18-year period. Analyses were performed using chi-squared, Wilcoxon rank sum and Kaplan–Meier methods, and multivariable Cox and logistic regression modelling.ResultsOf the 110 patients in the cohort, 59.1% were male and 90.9% were White. The median patient age was 64 years. A total of 59 (53.6%) patients underwent resection; 37 of these demonstrated R0. The 30-day mortality rate was 5.1%; the complication rate was 39.0%. The rate of resectability increased over time (36.4% vs. 70.9%; P= 0.001), as did the percentage of R0 resections (10.9% vs. 56.5%; P < 0.001). Of the 59 patients who underwent resection, 23 (39.0%) experienced recurrence. Multivariable Cox regression analysis identified resection margins [hazard ratio (HR) = 4.124 for positive vs. negative; P= 0.002] and type of operation (HR = 5.075 for exploration vs. resection; P= 0.001) as significant to survival.ConclusionsAlthough R0 resection can be achieved in only a minority of patients, these patients have a reasonable chance of longterm survival. The last decade has seen a significant rise in rates of resectability of Klatskin's tumour at specialty centres

    A Novel Classification System to Address Financial Impact and Referral Decisions for Bile Duct Injury in Laparoscopic Cholecystectomy

    Get PDF
    Purpose. The study was undertaken to evaluate a novel classification system developed to estimate financial cost of bile duct injury (BDI) and to aid in decision making for referral. Study Design. A retrospective review of patients referred for BDI was performed. Grade I injuries involve the duct of Luschka or accessory right hepatic ducts, grade II includes all other biliary injuries, and grade III includes all vasculobiliary injuries. Groups were compared using standard statistical methods. Results. There were 14 grade I, 74 grade II, and 20 grade III injuries. There was a significant difference in the cost and mortality of grade I (12,457,012,457, 0%), grade II (46,481, 1.4%), and grade III ($69,368, 15%, P = 0.002 and P = 0.030, resp.) injuries. Grade II and III injuries were significantly more likely to require surgical repair (OR 27.7, P < 0.001). Conclusion. We have presented a simple classification system that is able to accurately predict cost and need for surgical repair

    Farm Turnout Flow Recommendations for New Outlets in Cameron County Irrigation District No. 2

    Get PDF
    The Bureau of Reclamation requested recommendations on flow rates and capacity requirements of new farm turnouts in Cameron County Irrigation District No. 2. These outlets are being designed as part of a rehabilitation project which is replacing unlined canals with new underground pipelines in the portion of the district shown.A portion of this study was funded by Texas Cooperative Extension through the Rio Grande Basin Initiative administered by the Texas Water Resources Institute of the Texas A&M University System with funds provided through a grant from Cooperative State Research, Education, and Extension Service, U.S. Department of Agriculture, under Agreement No. 2001-001-45049-01149

    Polychrome: Creating and Assessing Qualitative Palettes with Many Colors

    Get PDF
    Although R includes numerous tools for creating color palettes to display continuous data, facilities for displaying categorical data primarily use the RColorBrewer package, which is, by default, limited to 12 colors. The colorspace package can produce more colors, but it is not immediately clear how to use it to produce colors that can be reliably distingushed in different kinds of plots. However, applications to genomics would be enhanced by the ability to display at least the 24 human chromosomes in distinct colors, as is common in technologies like spectral karyotyping. In this article, we describe the Polychrome package, which can be used to construct palettes with at least 24 colors that can be distinguished by most people with normal color vision. Polychrome includes a variety of visualization methods allowing users to evaluate the proposed palettes. In addition, we review the history of attempts to construct qualitative color palettes with many colors

    Evaluating snow microbial assemblages

    Full text link
    Psychrophiles are organisms that grow optimally below 20C (1). The US Great Basin is home to many mountain peaks with an abundance of alpine snow environments perfect for psychrophilic habitation. We analyzed samples from three different locations, Wheeler Peak, Pacific Crest Trail, and Mount Conness, characterizing and comparing the psychrophilic communities at varying depth intervals in the snow. Polymerase chain reaction (PCR) and denaturing gradient gel electrophoresis (DGGE) showed no notable difference in community structure with depth, but there was a distinct difference when comparing different snow environments (i.e. shaded vs. full sun exposure). The chlorophyll concentration decreased as the depth of the snow increased. By creating a clone library and utilizing DNA sequencing technology we were able to obtain 16S and 18S rRNA gene sequences from samples collected from Mount Conness, which allowed us to identify microbes living in the ecosystem. This information enabled us to produce bacterial and eukaryl phylogenetic trees, giving us a clear look into the diversity of this psychrophilic community. Out of seventy bacterial results there were fifty‐three ‐Proteobacteria, thirteen Sphingobacteria, and only three Actinobacteria, with one unclassified bacteria as well. These results will guide us in our future plans for experimentation
    corecore